Data Detangle

How Knowledge Graphs Bring Order to the HRA’s Data Diversity

Artboard Created with Sketch.
It takes a lot of data to construct the Human Reference Atlas (HRA).
But not all data is created equal!
HRA data comes from many different sources who may use different technologies and follow different protocols.
The data itself comes in many different formats, some of which may require a particular code to read.
Some of it is old data, and some of it is new.
It may have been mixed with other data or repurposed.
Some data might be open research data, available to all.
While some data might have restrictions that limit access, use, and distribution.
For the Human Reference Atlas (HRA), we need the ability to easily find the data we want, utilize it for our purposes, and share it as widely as possible.
Artboard Created with Sketch. Pure data Pure data Structured data
Of course, that data needs to be structured in a way that it can be readable by machines.
Ideally, though, that data structure would also be understandable to humans.
It would not only show what data exists in the HRA but also how pieces of that data relate to each other.
Artboard Created with Sketch. Tuft cell Large Intestine Digestive System Trachea Tuft cell Respiratory System part_of located_in part_of located_in Breathing System Eating System Large Intestine Digestive System Trachea Tuft cell Respiratory System part_of located_in part_of located_in
By labeling our data and connecting our labeled nodes with relational links,
we put our data into context and create a framework for moving from data
to knowledge
to insight.
Artboard Created with Sketch.
The type of data structure we are moving towards here is known as a “knowledge graph,” and they are a lot more common than you think.
Google was the first to introduce the term back in 2012.
But now major companies like Facebook, Amazon, and Netflix—all utilize knowledge graphs to represent relationships between people, products, and concepts.
Artboard Created with Sketch. Mom & dad Home Ice cream My puppy Food My city Ice cream Book- stores Fluffy pillows Me Things I like * my puppy * Ice cream * Fluffy pillows * Mom and dad * Home * Food * Book stores * My City
A knowledge graph gathers all the things that are important to a particular group or organization.
These things can be people, places, entities, concepts, databases, documents—really just about anything.
Each of those data entities is assigned a node. Then, it organizes all those things into a network of interrelations.
Artboard Created with Sketch. Research metadata Biological data Digital objects
A knowledge graph gathers all the things that are important to a particular group or organization.
Artboard Created with Sketch. Subject Object Predicate Organ A has relation to Organ B Renal cortex is a part of Kidney
Using the Resource Description Framework (RDF), each of these are expressed as a subject, predicate, and an object.
The predicate expressing the relationship between the entities.
This grouping is called a triple, and the relation between an anatomical structure and its parent organ might look like this.
Artboard Created with Sketch. Left Female Kidney Left Female Kidney Left Female Kidney Left Female Kidney Left Female Kidney Left Female Kidney Left Female Kidney
Let's see how this might look for a particular digital object created for the Human Reference Atlas.
Here's a 3D reference organ for the left female kidney.
And her's how it appears in the knowledge graph.
The "subject" entries in the left column all point to the same thing: the HRA's 3D reference organ of the left female kidney.
The "object" column lists all the other data in the HRA that the reference organ is connected to.
And the "predicate" column indicates the nature of that relationship.
A closer look at these predicates reveals relationships such as the creation date, version number, the raw data the 3D kidney was derived from, and many more.
What we see here is actually a network of nodes and edges, with our kidney reference organ as the central node with all its related data connected to it by labeled edges.
Of course, this is only one network. There are over 500 digital objects currently in the HRA, each with its own network. And each network is connected to all the others.
Utilizing a knowledge graph not only helps us structure the massive amount of different types of data that power the HRA.
It will also allow us to link up with other information networks to create a wide and radically open web of knowledge about the human body.